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Abstract 

Quantum theory was discovered in an adventurous way, under the urge to 
solve puzzles—like the spectrum of the blackbody radiation—that haunted the 
physics community at the beginning of the 20th century. It soon became clear, 
though, that quantum theory was not just a theory of specific physical systems, 
but rather a new language of universal applicability. Gan this language be re¬ 
constructed from hrst principles? Gan we arrive at it from logical reasoniM, 
instead of ad hoc guesswork? A positive answer was provided in Refs. [l|, 0, 
where we put forward six principles that identify quantum theory uniquely in 
a broad class of theories. We first defined a class of “theories of information”, 
constructed as extensions of probability theory in which events can be con¬ 
nected into networks. In this framework, we formulated the six principles as 
rules governing the control and the accessibility of information. Directly from 
these rules, we reconstructed a number of quantum information features, and 
eventually, the whole Hilbert space framework. In short, our principles char¬ 
acterize quantum theory as the theory of information that allows for maximal 
control of randomness. 


1. Introduction 

Quantum foundations is an old field—as old as quantum mechanics itself. 
Among the early works stand out the seminal papers by Einstein, Podolski, and 
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Rosen and Schrodinger [^, who addressed quantum entanglement for the 
first time, exploring quantum mechanics within the Hilbert space formulation. 
Almost at the same time, Birkhoff and von Neumann looked at the theory 
in a wider framework allowing for alternative theories. From that angle, it was 
natural to ask what is special about quantum mechanics and why Nature obeys 
its peculiar laws. The take of Birkhoff and von Neumann was that quantum 
theory should be regarded as a new form of logic, whose laws could be derived 
from physically motivated axioms. This programme gave rise to the tradition 
of quantum logic SSii0, whose ramifications are still object of active 


research nil. 


Researchers in quantum logic managed to derive a significant part of the 
quantum framework from logical axioms. However, there is a general consensus 
that the axioms put forward in this context are not as insightful as one would 
have hoped. For both experts and non-experts, it is hard to figure out what is 
the moral of the quantum-logic axiomatizations. What is special about quan¬ 
tum theory after all? Why should quantum theory be preferred to alternative 
theories? Not many answers can be found in the popular accounts of quan¬ 
tum logic (see e.g. the Wikipedia entry [l^) and even understanding what the 
axioms are requires delving into a highly specialized literature. 

The ambition to find a more insightful axiomatization reemerged with the 
rise of quantum information. The new field showed that the mathematical 
axioms of quantum theory imply striking operational consequences, such as 
quantum key distribution 13^ 141, quantum algorithms 15. 16|. no-cloning [13 
d^ . quantum teleportation and dense coding [ 2 ^. A natural question is: 
Can we reverse the implication and derive the mathematics of quantum theory 
from some of its operational consequences? This question is at the core of 
a research programme launched by Fuchs 2l| and Brassard 22|, which can 
be synthesized by the motto “quantum foundations in the light of quantum 
information” 0. The ultimate goal of the programme is to reconstruct the 
whole structure of quantum theory from few simple principles of information- 
theoretic nature. 

One may wonder why quantum information theorists should be more suc¬ 
cessful than their predecessors in the axiomatic endeavour. A good reason is the 
following: In the pre-quantum information era, quantum theory was viewed like 
an impoverished version of classical theory, lacking the ability to make determin¬ 
istic predictions about the outcomes of experiments. Clearly, this perspective 
offered no vantage point for explaining why the world should be quantum. Con- 
trarily, quantum information provided plenty of positive reasons for preferring 
quantum theory to its classical counterpart. Turning some of these reasons into 
axioms then appeared as a promising route towards a compelling axiomatiza¬ 
tion. Pioneering works along this route are those by Hardy [ 2 ^ and D’Ariano 
26l . l27l |. More recently, the programme flourished, leading to an explosion of 


^This was also the title of one influential conference, held In May 2000 at the Universite 
de Montreal |24| . which kickstarted the new wave of quantum axiomatizations. 
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new axiomatizations 0 M, m, 0, mi, m m. 

Here we review the axiomatization of Refs. 0. In this work, quantum 
theory is derived from six principles, formulated in a general framework of theo¬ 
ries of information. The first five principles—Causality, Purity of Composition, 
Local Discriminability, Perfect State Discrimination, and Ideal Compression— 
express ordinary properties that are shared by quantum and classical infor¬ 
mation theory: such principles define what we call a standard theory of in¬ 
formation. Among all standard theories of information, the sixth principle— 
Purification—identifies quantum theory uniquely. Purification states that every 
random preparation can be simulated via non-random preparation procedure, 
in which the system is manipulated together with an environment. An agent 
that has access to both the system and the environment would then have maxi¬ 
mal control of the preparation— maximal in the sense that no other agent could 
conceivably have higher control. The moral of our work is that Quantum The¬ 
ory is the theory that allows maximal control of randomness, giving us—at least 
in principle—the power to control all possible preparations and all possible dy¬ 
namics. 

The chapter is structured as follows: in section[^we provide an introduction 
to the framework of operational-probabilistic theories —general theories of infor¬ 
mation arising from the combination of the circuit framework with probability 
theory. Then, section [3] presents the background to the reconstruction, dis¬ 
cussing the main standing assumptions—finite-dimensionality, non-determinism, 
and closure under limits—and introducing a few basic operational tasks: sig¬ 
nalling, collecting side information, doing state tomography, distinguishing states, 
compressing information, and simulating preparations. The principles are then 
analyzed in sectionH) Section [5] provides a guided tour through the main results 
in our reconstruction, showing how the main features of quantum theory can 
be derived directly from the principles. Finally, the conclusions are drawn in 
section 1 


2. Operational-probabilistic theories 


In order to reconstruct quantum theory and the features of quantum informa¬ 
tion, one needs a framework capable to describe a variety of alternative theories. 
Different frameworks have been pro posed for this scope, under the broad name 
of general probabilistic theories |25l [s^ [2^, [l^, S, [S, HI, 0 EM 38, EM- Our 
reconstruction is based on a specific variant of general probabilistic theories, 
which we call operational-probabilistic theories (OPTs) [lllMl. OPTs are an ex¬ 
tension of probability theory, in which events can be connected into circuits. 
Technically, OPTs arise from the combination of the categorical framework of 
Abramsky and Coecke 4^, 41, with the toolbox of elementary probability 
theory. We regard such a combination as the natural mathematical object de¬ 
scribing a “general theory of information”. In the following we present a concise 
summary of the OPT framework. 
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2.1. Operational structure 

2.1.1. Systems 

Systems are labels, which determine how different events can be connected 
to one another. We denote systems by capital letters, such as A,B,C, and so 
on. The letter I will be reserved for the trivial system, representing “nothing” 
0. The set of all systems under consideration will be denoted by Sys. 

Every two systems A and B can be considered together as a composite 
system, denoted by A 0 B. The composition of systems is associative, namely 

A 0 (B 0 C) = (A 0 B) 0 C VA,B,C (1) 

and has the trivial system as identity element, namely 

A 0 I = 10 A = A VA. (2) 

The second condition means that considering system A together with “nothing” 
is the same as considering system A alone. 


2.1.2. Events 

An event of type A —>■ B represents the occurrence of a transformation that 
converts the input system A into the output system B. An event S of type 
A —>■ B will be represented graphically as 



100 The set of all events of type A —>■ B will be denoted by Transf(A B), identi¬ 
fying events with the corresponding transformations. 

When the input and output systems are composite systems, we draw boxes 
with multiple wires. For example, the box 



represents an event of type (A 0 C) —>■ (B 0 D). 

Some types of events are particularly important and deserve a name of their 
own. An event of type I —>■ A is a preparation-event (or simply, a preparation), 
that is, an event that makes system A available to further processing. An event 
of type A —>■ I is an observation-event (or simply, an observation), after which 
system A is no longer available. Preparation- and observation-events will be 
represented as 

and 


®More precisely, “nothing that the theory cares to describe”. 
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respectively. We will often use the Dirac-like notation (a| and \p) the observation 
a and the preparation p, respectively. 

Events of type I —^ I will be called scalars [i^. Scalars will be represented 
“out of the box”, as 

s;= 

Later, scalars will be associated to probabilities. For the moment, however, they 
are just a special type of events. 


2.1.3. Composition of events 

Events can be connected into networks through the following operations 

1. Sequential composition: an event of type A —>■ B can be connected to an 
event of type B —>• C, yielding an event of type A —>• C. 

2. Parallel composition: an event of type A —^ A' can be composed with an 
event of type B —>• B', yielding an event of type (A (g) B) —>• (A' 0 B'). 

Intuitively, the sequential composition represents two events happening at 
“subsequent time steps” 0 . The sequential composition of two events £ and J- 
of matching types is denoted by T o £ and is represented graphically as 


A 


B 


c 




J- 



A 


To£ 


c 


This graphical notation is justified by the requirement that sequential compo¬ 
sition be associative, namely 


go{po£) = {goj^)o£, 


( 3 ) 


for arbitrary events f, T and g of matching types. In addition to associativity, 
sequential composition is required to have an identity element for every system. 
The identity on system A, denoted by Ia, is the special event of type A —^ A 
identified by the conditions 



( 4 ) 

( 5 ) 


required to be valid for arbitrary systems A, B and arbitrary events £ and P 
of types A —B and B —A, respectively. The intuitive content of the above 
equations is that Ia represents the process that “does nothing on the system”. 
Consistently, we use the graphical notation 


A 



^Per se, the mathematical formalism does not force us to interpret the order of sequential 
composition as an order in time. Nevertheless, composition in time is the reference situation 
that we will have in mind when phrasing our axioms. 


5 

































Mathematica Uy, conditions ([3|), (|3]), and impose that the events form a 
category 431144| . in which the systems are the objects and the events are the 
arrows. For the sequential composition of a preparation and an observation we 
will often use the Dirac-like notation, 


(a|p) := . 


( 6 ) 


Let us consider parallel composition. The parallel composition of two events 
6 and T is denoted as 0 and is represented graphically as 



The graphical notation is justihed by the requirement of the following condition 

[e®T)o[Q®n) = (EoQ)®{J oU), (7) 


where and % are arbitrary events of matching types. Such condition is 

necessary for the graphical notation to make sense, since in graphical notation 
the two sides of Eq. © look exactly the same. In addition to Eq. ©, parallel 
composition is required to satisfy the condition 


2iA®B = Ta 0 TIb ■ (8) 

Mathematically, the presence of parallel composition turns the category of 
events into a strict monoidal category, whose key properties are summarized 
by Eqs. o, 0, and ([8|). We denote such category by Transf. 


2 . 1 . 4 . Reversible events 

An event £ of type A —^ B is reversible iff there exists another event iP, of 
type B —>■ A, such that 


and 



(9) 

( 10 ) 


When this is the case, we write P = £~^ and we say that systems A and B are 
operationally equivalent (or simply equivalent). 

We denote by RevTransf(A —^ B) the set of reversible events of type A —>■ B. 
Such set (which may be empty) depends on the specific theory. In general, we 
require the existence of a reversible event that swaps pairs of systems. Given 
two systems A and B, the swap of A with B—denoted by 5 a,b— is a reversible 
event of type (A 0 B) —>• (B 0 A) satisfying the condition 



( 11 ) 
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for arbitrary systems A,B, A',B' and arbitrary events as well as the con¬ 
ditions 


A 

^A.B 

B 

^B.A 

A A 

B 

A 

= (1^) 
B B ^ ' 


and 



(13) 


The presence of the swap, with the related equations (HH), (ini), and m. , turns 
the strict monoidal category into a strict symmetric monoidal category 45 . 
(strict SMC, for short). 


2.1.5. Tests 

A test represents a process, which can generally be non-deterministic—i. e. it 
can result in multiple alternative events. Specifically, a test of type A —>■ B is 
collection of events of type A —>■ B, labelled by outcomes in a suitable outcome 
set X. The test £ := {Sxjxex is represented graphically as 



{£4 


xGX 


B 


When two events/transformations belong to the same test, we say that they are 
coexisting. 

The set of tests of type A —>■ B with outcomes in X will be denoted by 
Tests(A —>■ B,X). We will restrict our attention to tests with a finite outcome 
set. 

Tests with |X| = 1 are called deterministic, because only one event can take 
place. We will often identify a deterministic test {Sxq} with the corresponding 
event Exg, saying that Ex^ is a deterministic event (or a deterministic transfor¬ 
mation). The deterministic transformations form a strict symmetric monoidal 
subcategory of Transf, denoted by DetTransf. 

Some types of tests are particularly important and deserve a name of their 
own. A test of type I —>■ A is a preparation-test (or an ensemble), which prepares 
system A in a non-deterministic way, with the possible preparations labelled by 
different outcomes. A test of type A I is an observation-test, corresponding to 
a demolition measurement that absorbs system A while producing an outcome. 


2.1.6. Composition of tests 

Not all collections of events of are “tests”. Whether or not a specific collec¬ 
tion is a test is determined by the theory, compatibly with two basic require¬ 
ments: 

1. the set of tests must be closed under sequential and parallel composition 


7 







































2. the set of tests must contain deterministic tests corresponding to reversible 
events. 

Let us discuss these requirements in more detail: 

1. The sequential composition of two tests S = {£x}^^x ^ ~ 

of matching types is defined as 

T o £ := {Ty o £^} ( 2 ._y)gxxY ■ 

The test T o £ represents a cascade of two (generally non-deterministic) 
processes, wherein each process can result in a number of alternative 
events. Similarly, the parallel composition of two tests is defined as 

£ ® T ■.= {£x ® ■^i/}(x,y)6XxY 

and represents two non-deterministic processes occurring in parallel. The 
composition of tests induces a composition of their outcome spaces via the 
Cartesian product. As a consequence, the set of all outcome spaces must 
be closed under this operation. We will denote such a set by Outcomes. 

2. If W is a reversible event of type A —?► B, we require that there exists a 
deterministic test U := {lA}. In particular, there must be a deterministic 
test Xa := {Xa} corresponding to the identity on system A and a deter¬ 
ministic test <Sa,b := {^a.b} corresponding to the swap of systems A and 
B. 


Note that all the basic equations valid for events can be lifted to tests: for ex¬ 
ample, the identity test acts as identity element with respect to the composition, 
that is, one has 


and 



(14) 

(15) 


for arbitrary systems A, B and for arbitrary tests £ and T of types A —>■ B and 
B —>• A, respectively. Since events form a strict SMC, also the tests form a strict 
SMC, which we denote by Tests. 


2.1.7. Summary about the operational structure 

Summarizing the ideas introduced so far, an operational structure consists 
of a triple 

Op = (Transf, Outcomes, Tests), 

where Transf is a strict symmetric monoidal category, Outcomes is a collection 
of sets closed under Cartesian product, and Tests is a strict symmetric monoidal 
category, related to Transf and Outcomes as described in the previous paragraph. 
Intuitively, the operational structure describes 

1. what can be done (connecting tests) 

2. what can be observed (outcomes), and 

3. what can happen (events). 


















2.2. Probabilistic structure 

The goal of a physical theory is not only to describe a class of experiments, 
but also to make predictions about the outcomes of such experiments. In the 
200 following we show how this can be accomplished by adding a probabilistic struc¬ 
ture on top of the operational structure. 

2.2.1. Assigning probabilities 

An experiment consists in sequence of tests that starts from a preparation- 
test and ends with an observation-test, leaving no open wires, as in the following 
example 




T 


(16) 


If we compose all the tests involved in an experiment, we obtain a single test, 
which transforms the trivial system into itself. In order to make predictions on 
the outcomes of the experiment, we need a rule assigning a probability to the 
events of such test. The rule is provided by the probabilistic structure of the 
theory: 


Definition 1 (Probabilistic structure). Let Op be an operational structure. A 
probabilistic structure for Op is a map Prob : Transf(I —>■ I) —>■ [0,1], which 
associates a given scalars to a probability Prob(s), in accordance to the following 
two requirements: 

1. Consistency: X)xex P''ob(sa;) = 1 for every outcome set X G Outcomes 
and for every test s G Tests(I I, X) 

2. Independence: Prob(s ® t) = Prob(s) Prob(t) for every pair of scalars s 
and t. 


The consistency requirement guarantees that we can interpret Prob(s 3 ;) as 
the probability of the outcome a: G X. The independence requirement guarantees 
that experiments that involve only independent tests on two systems give rise 
to uncorrelated outcomes. As observed by Hardy [2^, [s^, independence is 
equivalent to the requirement that probabilities can assigned to the outcomes 
of an experiment in a way that is independent of the context in which the 
experiment is performed. Note that the map Prob needs not be surjective: for 
example, in a deterministic theory the range of Prob are only the values 0 and 
1 . 

We are now ready to give the formal definition of OPT: 

Definition 2. A n operational-probabilistic theory Q is a pair (Op, Prob) con¬ 
sisting of an operational structure Op and of a probabilistic structure for Op. 


2.2.2. Statistically equivalent events 

Once probabilities are introduced, it is natural to identify events that give 
rise to the same probabilities in all possible circuits. Precisely, we say that two 
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events of type A —>■ B, say £ and £', are statistically equivalent iff 



for every system R, every preparation-event p € Transf(I —>■ A ® R) and every 
observation-event m G Transf(B 0 R —>■ I). We denote by [£] the equivalence 
class of the event £. 

Equivalence classes can be composed in sequence and parallel in the obvious 
way 

[F] o [£] := [Fo£]^ [£] 0 [F] := [£ 0 F] 

and it is easily verified that both definitions are well-posed. Furthermore, [Ia] 
and [iSa,b] behave like the identity on A and the swap between A and B, respec¬ 
tively. As a result, the equivalence classes of events form a strict SMC, which 
we denote by [Transf]. 

Similar considerations apply to tests: the equivalence class of a test £ = 
{£x}xex is defined as [£] := {{£xWx^y^ and the sequential/parallel composition 
of equivalence classes of tests are induced by the sequential/parallel composition 
of events: 


[F] o [£] := [Fo£], [£] 0 [F] := [£®F]. 

Again, the equivalence class of [Xa] and [<Sa,b] behave like the identity and the 
swap. As a result, the equivalence classes of tests form a strict SMC, which we 
denote by [Tests]. 


2.2.3. The quotient OPT 

The notion of statistical equivalence allowed us to transform the original op¬ 
erational structure Op = (Transf, Outcomes, Tests) into a new operational struc¬ 
ture [Op] := ([Transf], Outcomes, [Tests]), which we call the quotient operational 
structure. The operational structure [Op] comes with an obvious probabilistic 
structure [Prob], defined as 

[Prob] ([s]) := Prob(s) Vs G Transf(I —I). 

It is indeed immediate to verify that the consistency and independence condi¬ 
tions in definition [T] are satisfied. As a result, the original OPT 0 = (Op, Prob) 
has been turned into a new OPT [0] := ([Op], [Prob]), which we call the quo¬ 
tient OPT. Intuitively, the quotient OPT contains all the information that is 
statistically relevant, disregarding those distinctions that have no consequences 
for the purpose of making probabilistic predictions. 

In the following we will focus on quotient OPTs: by default, an OPT will 
be a quotient OPT. Accordingly, we will omit the symbol of equivalence class 
everywhere and write 0 = (Op, Prob), assuming that equivalence classes have 
been already taken from the start. This is equivalent to requiring the following 
separation property [dTj ]: 
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Definition 3. An OPT satisfies the separation property iff for every pair of 
systems A and B and every pair of events £ and £' of type A —B the condition 


Prob 



implies £ = £'. 



VR £ Sys 

Vp G Transf(I —>■ A (g) R) 
Vm G Transf(B (g R —>■ I) 


In a quotient OPT preparation-events (respectively, observation-events) will 
be called states (respectively, effects) and we will use the notation St(A) := 
Transf(I —>■ A) (respectively, Eff(A) := Transf(A —>■ I)). 


2.2.4- Vector space representation of an OPT 

OPTs satisfying the separation property have a convenient representation in 
terms of ordered vector spaces and positive maps. The construction proceeds 
in four steps: 

1. The separation property guarantees that a scalar s can be identified with 
its probability Prob(s). Hence, from now on we will omit Prob and will 
identify the set of scalars Transf(I —>■ I) with a subset of the real interval 
[ 0 , 1 ]. 

2. By the separation property, a state p G St(A) can be identified with the 
real-valued function p : Eff(A) —R defined by 

p(m) := 

(indeed, one has p = a ii and only ii p = a). Since real-valued functions 
form a vector space, we can define the vector (sub)space spanned by the 
states of system A as 

StK(A) := Spanjj{p | p G St(A)} . 

Limiting ourselves to linear combination with positive coefficients we ob¬ 
tain the proper cone St+(A), which turns StK(A) into an ordered vector 
space. 

3. Every effect m G Eff(A) defines a linear function m : StR(A) —>■ K, via the 
relation 

TO Ci:= ^c* , V{cj}cR, V{pj}cSt(A). 

It is immediate to see that the definition is well-posed, namely m Ci Pi) = 
m c'j p'j^ whenever c* Pi = c' p'. Again, the effect m can be 
identified with the linear function m thanks to the separation property. 
Taking linear combinations of effects we obtain the vector space 

EfFR(A) := SpanRlm | m G Eff(A)} , 

while restricting to positive linear combinations we obtain the proper cone 
Eff+(A). As a result, also EfFM(A) is an ordered vector space. 
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4. Every event £ of type A —>• B induces a linear map £ : StK(A) —^ StR(B), 
via the definition 

£ := ^ c^{£opi), V{cJcK, V{pJcSt(A). 

Again, it is not hard to see that the definition is well-posed, namely that 
^(Si Ci Pi) = £ (Sj Cj p'j^ whenever Y.i c* Pi = Y.j P'y Note that the 

map £ is not only linear, but also positive: indeed, it sends elements of the 
cone St+(A) to elements of the cone St_|-(B). We call £ the state change 
associated to £. 

At this point, a few remarks are in order: 

1. Linearity vs convexity. Traditionally, the linearity of state changes has 
been argued from the assumption that the state space St (A) is convex. 
However, our argument shows that such assumption is not needed: the 
probabilistic structure alone suffices to define the linear map £. 

2. Finite vs infinite dimensional systems. For a given system A, we define Dy 
to be the dimension of the vector space StR(A) and we say that system 
A is finite dimensional if Dy is finite. For finite systems, one has the 
equality EffR(A) = StR(A)*, where StM(A)* is the vector space of all linear 
functionals on StR(A). For infinite dimensional systems, such an equality 
may not hold. 

3. The no-restriction hypothesis. Since effects are identified with positive 
linear functions, one has the inclusion Eff 4 .(A) C St 4 .(A)*, where St-|_(A)* 
denotes the dual cone of St^. (A) 

St+(A)* := {m e StR(A)* I to(p) > 0 VpeSt+(A)}. (17) 

Even for finite dimensional systems, the inclusion Eff 4 _(A) C St-|_(A)* may 
not be an equality. The assumption Eff+(A) = St+(A)* is known as No- 
Restriction Hypothesis [l| . We stress that such an assumption is not made 
in our derivation. 

4. Transformations vs linear maps. Unlike in the case of states and effects^ 

the correspondence between the transformation £ and the linear map £ 
may not be one-to-one. The reason for this is that the difference between 
two transformations £ and £' may show up when one applies them locally 
on a part of a composite system: one can have £ ® ^ £' ® for some 

R € Sys even if £1 = £'. This problem disappears if one assumes the axiom 
of Local Tomography, as we will see later in this chapter. In the lack of 
Local Tomography, however, the transformation £ can still be identified 
with a linear map: for this purpose, one can choose the linear map £q 
defined by 


The map transforms elements of the (infinite-dimensional) vector space 
StK^ 0 (A) := 0 RgSys StK(A(g)R) into elements of the (infinite-dimensional) 
vector space StR_ 0 (B) ©RgSys StR(B ® R). Then, the separation prop¬ 
erty guarantees that the correspondence between £ and is one-to-one. 

5. The vector space of transformations. So far we have defined the vector 
spaces of states and effects. A vector space of transformations can be 
defined using the one-to-one correspondence with the linear maps in Eq. 
(fT51) and setting 


TransfR(A —>■ B) := SpanR{Transf(A —>■ B)} . 


(19) 


Again, a proper cone Transf+(A —>• B) can be defined by restricting the 
attention to linear combinations with positive coefficients. Note that, in 
general, the vector space TransfR(A —>■ B) and the cone Transf+(A —>• 
B) can be infinite-dimensional even if both systems A and B are finite 
dimensional. However, this is not the case when the theory satisfies the 
Local Tomography. 

2.2.5. Closure under coarse-graining 

A key notion that comes with the probabilistic structure is the notion of 
coarse-graining: given a test T' = {Tx}x£yi, one can decide to identify some out¬ 
comes, thus obtaining another, coarse-grained test. Mathematically, a coarse- 
graining is defined by partitioning the outcome set X into mutually disjoint 
subsets {XylygY- Relative to such partition, the coarse-graining of the test 7” 
is the test T' = {Ty}yeY defined bjo 



( 20 ) 


xeXy 


setting Ty = 0 for Xj, = 0 , where 0 is the zero element of the vector space 
TransfR(A —^ B). Note that, by calling 'T' a test we have implicitly made two 
assumptions, namely that 

1. the set Y belongs to Outcomes 

2. the collection {7)J}ygY C TransfR(A —^ B) belongs to Tests(A —>• B, Y). 

From now on, we will require that our OPT is closed under coarse-graning 
meaning that the above conditions are satished. 

By coarse-graining over all outcomes of a test 7~ € Tests(A —>■ B,X) one 
obtains a deterministic test, identified with the deterministic transformation 
^ ■= YlfxeX Tx £ DetTransf(A —>■ B). In particular, when a preparation test 
p £ Tests(I — >■ A, X) satisfies Px = P^ie say that the test p is an ensemble 

decomposition of p. 


®Note that the summation is well-defined thanks to the vector space structure of 
TransfH(A —>■ B). 
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2.2.6. Summary of the OPT framework 

Let us sum up the main points discussed so far. We defined an OPT as a pair 
0 = (Op, Prob), consisting of an operational structure Op = (Transf, Outcomes, Tests) 
and of a probabilistic structure Prob that assigns probabilities to scalars. We 
restricted our attention to OPTs that satisfy the Separation Property (defini¬ 
tion 0, which implies that one can identify scalars with probabilities, states 
with elements of suitable vector spaces, and effects with linear functionals over 
them. Transformations with nontrivial input and output induce linear maps 
on the corresponding state spaces. Finally, in agreement with the probabilistic 
interpretation, we demanded that the theory 0 is closed under coarse-graining. 

3. Background of the quantum reconstruction 

In this section we provide some background that will be useful for our recon¬ 
struction of quantum theory. We start by reviewing three standing assumptions: 
finite-dimensionality, non-determinism, and closure under operational limits. 

We will then review the operational tasks that motivate our axioms. 

3.1. Standing assumptions 

Here we introduce three standing assumptions that will be made in the rest 
of the chapter. These assumptions are common to all recent axiomatizations of 
quantum theory, and could also be even incorporated in the OPT framework. 

We keep them separate from the rest, both for clarity of presentation and for the 
sake of maintaining the OPT framework as flexible as possible. The assumptions 
are the following: 

1. Finite dimensionality. We restrict our attention to finite systems, i. e. sys¬ 
tems with finite dimensional state spaces. Operationally, this means that 
the state of every system can be identified from the statistics of a finite 
number of finite-outcome measurements. Of course, the implicit assump¬ 
tion here is that finite systems exist and form a sub-theory of our theory, 
meaning that the operational structure Op contains a non-trivial sub¬ 
structure FiniteOp, consisting of transformations, outcome sets, and tests 
involving only finite systems. 

2. Non-determinism. While the OPT framework accommodates a variety 

of theories, here we focus on OPTs that are non-deterministic, meaning 
that there exists at least one experiment for which the outcome is not 
determined a priori. Mathematically, this means that the range of the 
probability function Prob is not just {0,1}. Note that non-determinism 
is a weaker assumption than convexity of the state spaces: there exist 
indeed examples of theories—such as Spekkens’ toy theory —that are 

non-deterministic and yet do not have convex state spaces. 

3. Closure under operational limits. Suppose that (7)i)neN is a sequence of 
transformations of type A —B and that T is an element of the vector 
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space TransfR(A —>■ B) such that 


VR S Sys 

Vp G Transf(I —>■ A (g) R) 
Vm G Transf(B (g) R —>• I), 



meaning that the probability of every experiment involving Tn converges 
to the probability of an hypothetical experiment involving 'T. When this 
is the case, we assume that T belongs to Transf(A —>• B). Operationally, 
one can think of the sequence (7^)„gN a limit procedure to implement 
the transformation T. 

3.2. Basic operational tasks 

We now give a brief list of the operational notions on which our axioms are 
based. 


3.2.1. Signalling 

When a number of devices are connected into a network, it is natural to ask 
whether one node of the network can signal to another. For example, given the 
experiment 



( 21 ) 


one can ask whether the choice of the test 7 ” can influence the outcome of 
the test S. Precisely, the question is whether or not the marginal probability 
distribution for the outcomes of S (obtained by summing over the outcomes 
of all the other tests in the network) depends on 7 ”. Denoting the marginal 
probability distribution by p{x\'T), a; G X, we say that the node occupied by 
the test 7 " does not signal to the node occupied by the test S iff 


p{x\To) = p{x\Ti) VigX, 


for every possible choice of tests To and 7i. Similarly, one can ask whether the 
node occupied by the test S can signal to the node occupied by the test T. 
Now, note that the test S is performed after the test T: if the node occupied 
by S can signal to the node occupied by 7 " we say that the circuit of Eq. m 
allows for signalling from the future to the past. 


3.2.2. Collecting side information 

Suppose that the test T = {T^Ixgx is obtained from the test T~' = {Tz}zgz 
via coarse-graining, namely 

where {'Zix}x^x is a partition of Z into disjoint subsets. In this case we say that 
400 'T' refines 'T . Now, it is convenient to relabel the outcomes of 'T' as z = (x, y), 
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with X G X and y G and to write 72 y in place of 7^. In this way, we can 
think of the random variable y as a, side information, which is not accessible to 
the agent Alice performing the test 7~, but may be accessible to some other agent 
Eve. This picture is particularly relevant to cryptographic scenarios, wherein 
Eve could be an eavesdropper attempting to collect as much information as 
possible. In all such scenarios, a special role is played by those transformations 
that do not leak any useful side information. We call such transformations pure: 

Definition 4. We say that a transformation £ is purcj^ iff for every test 7~ 
containing £ and for every test 7~' refining 7~ one has 

'T~xo,y ~ Py "^0 1 (22) 

where Xq is the outcome such that Txo = ^ o,nd {py} is a probability distribution. 

Informally, the purity condition (0^ states that the side information pos¬ 
sessed by Eve is uncorrelated with the transformation £ taking place in Alice’s 
laboratory. We denote the set of pure transformations of type A —>■ B by 
PurTransf(A —>■ B). In the special case of transformations with trivial input we 
will use the notation PurSt(A) (respectively, PurEfF(A)), referring to pure states 
(respectively, pure effects). An pure test is a test consisting of pure transforma¬ 
tions. 

Transformations that are not necessarily pure will be called mixed. Among 
the mixed transformations, the ones that are in the interior of the cone Transf_(. (A — 
B) play an important role. They are defined as follows: 

Definition 5. A transformation £ G Transf(A —>■ B) is called internal iff for 
every transformation T G Transf(A —^ B) there exists a transformation Q and a 
scaling constant A > 0 such that 

1 . £ = X7 + g 

2. \7 and Q coexist in a 


®In previous works, we used different names for transformations that do not allow for 
side information: in Refs. nil they were called atomic^ while in the popularized version 
of Ref. they were called fine-grained. We apologize with our readers for the changes 

of terminology, due to an ongoing search for the word that best captures this operational 
concept. In this chapter, we adopted the word pure, because i) this term is the standard one 
in the case of states and ii) using the same term for transformations should hopefully ease 
the reading. Still, a warning is in order: when the set of transformations Transf(A B) 
is convex, the pure transformations PurTransf(A B) may not coincide with the extreme 
points of Transf(A B). For example, in quantum theory the identity effect /a is an extreme 
point of the set of effects, but is not pure in the sense of our definition because it can be 
decomposed e. g. as 7 a = where the effects {Pn = |'n.)('n| | n = 1,..., d\} represent 

a projective measurement on some orthonormal basis {|n) | n = 1,... , dA}- 

^Note that, in principle, our definition of “internal transformations” may not include all 
the transformations in the interior of the cone, because the XP and Q may fail to coexist in 
a test. However, this annoying discrepancy disappears under the mild assumption that the 
set of transformations is convex. Later, we will justify this assumption on the basis of the 
Causality axiom. 
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Roughly speaking, an internal transformation is compatible with the occur¬ 
rence of any other transformation of the same type. Internal transformations 
with trivial input (output) will be called internal states {internal effects). 


3.2.3. State tomography 

The task of state tomography consists in identifying the state of a system 
from the statistics of a restricted set of observations. Suppose that an experi¬ 
menter is able to perform a set of observation-tests and let M be the set of all 
effects appearing in such tests. 

Definition 6. We say that the effects in M are tomographically complete for 
system A iff, for every pair of states p and p' of system A, one has the implica¬ 
tion 




Vm £ M 


In the contrapositive: if two states are different, then the difference can be 
detected from the statistics of some effect in M. 

Let us consider state tomography for composite systems. Suppose that two 
experimenters Alice and Bob perform measurements on two systems A and B, 
respectively, and that Alice (Bob) is able to perform the set of measurements 
with effects M (N). Then, by coordinating their choices of measurements and 
by communicating the outcomes to each other, Alice and Bob can observe the 
statistics of all product measurements. Hence, their set of measurement effects 
will be 

M 0 N := {m 0 n | m £ M , n £ N} . 

Now the question is: is there a choice of measurement effects M and N such 
that the set M 0 N tomographically complete? In the affirmative case, we say 
that system A 0 B allows for local tomography: 


Definition 7. System A 0 B allows for local tomography iff, for every pair of 
states p,p' £ St(A0B), one has the implication 




Va £ Eff(A), 
V& £ EfF(B) 


(23) 


! ( 24 ) 


More generally, we have the following 

Definition 8. An K-partite system A = allows for local tomography 

iff for every k £ {I,..., A"} there exists a set of measurement effects Mfc on 
system Ak such that the set Mfc is tomographically complete. 
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For a given OPT, it is easy to see that the following conditions are equivalent: 

1. every multipartite system allows for local tomography 

2. every bipartite system allows for local tomography. 

In other words, the possibility of local tomography for arbitrary composite sys¬ 
tems can be established by just checking bipartite systems. 

3.2.4- State discrimination 

The task of state discrimination can be presented as a game featuring a 
player and a referee. The referee prepares a physical system A in a state px, 
belonging to some set {px | a: € X} known to the player. The player is asked 
to guess the label x. In order to do that, she performs a measurement m with 
outcomes in X: upon finding the outcome a:', she will guess that the state was 
Px'. If the player guesses right all the times, we say that the states are perfectly 
distinguishable: 

Definition 9. The states {px \ x G X} are perfectly distinguishable iff there 
exists a measurement m such that 

{jrix\px'^ — ^x,x' Va:,a: G X. 

When this is the case, we say that m is a discriminating measurement. 

Note that, in order to be perfectly distinguishable, the states must be 

1. normalized, namely ||pa;|| = IVx G X, where || • || is the operational norm 
[H given by ||p|| = sup„gEfF(A) Hp) 

2 . non-internal: indeed, if a state Px' is internal, then {mx\px') = 0 implies 
mx = 0, in contradiction with the condition (mx\px) = 1- 

Note that a priori an OPT may not have any distinguishable states at all. 
However, the existence of distinguishable states is essential if we want our theory 
to include classical computation and classical information theory. 


3.2.5. Ideal compression 

A preparation-test p G Tests(I —>■ A, X) can be thought as describing a 
source of information. An interesting question is how well such information can 
be transferred from the original system to another physical support, say system 
B. An encoding of the preparation-test p is a deterministic transformation 
E G DetTransf(A — >■ B), which transforms p into a new preparation-test p' := 
{£ o px}xC:X- The states {£ o px \ x G are called codewords. 

The ideal property of an encoding is to be lossless, in the following sense: 


Definition 10. An encoding £ G DetTransf(A —>■ B) is lossless for the preparation- 
test p G Tests(I —>■ A, X) iff there exists a deterministic transformation D G 
DetTransf(B A), called the decoding, such that 


A 

c 

B 


A 


C 





■ = V 


Vx G X. 


(25) 


We say that 
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• £ is a lossless encoding for the state p G DetSt(A) iff £ is a lossless 
encoding for every ensemble decomposition of p. 

• £ a lossless encoding of system A into system iff £ is a lossless encoding 
for all states p G DetSt(A). 

The notion of encoding offers an operational way to compare the size of 
different systems: naturally, we can say that system A is no larger than system 
B iff there exists a lossless encoding of A into B. 

Among all possible encodings, we now consider the compressions: 

Definition 11. A compression of system A into system B is an encoding £ G 
DetTransf(A —B) where B is no larger than A. 

How much can we compress a given state? The ultimate limit to compression 
is when every state of system B is proportional to a codeword, i. e. when every 
state (7 G St(B) can be written as cr = \ £pxo, for some scaling constant A > 0 
and some state belonging to some ensemble decomposition of p. When this 
is the case, we say that the compression £ is maximally efficient. Summing up, 
we have the following 

Definition 12. A transformation £ G DetTransf(A —>■ B) is ideal compression 
of the state p G DetSt(A) iff it is lossless and maximally efficient. 

3.2.6. Simulating preparations 

500 A state can be prepared in many different ways. For example, a state pA 
could be prepared by a circuit that involves many auxiliary systems, which 
interact with A and are finally discarded. We refer to these systems as the 
environment and describe them collectively as a single system E. Assuming 
that the system and the environment are initially uncorrelated, the fact that 
the circuit prepares the state pA is expressed by the diagram 



(26) 


where po a-nd po are the initial states of system and environment, respectively, 
U is a transformation representing all the system-environment interaction, and 
e is a some effect. By defining the state pae := W(po ® Va) the circuit of Eq. 
((^ can be simplified to 



(27) 


To capture the idea that the environment is discarded, we require the effect 
e to be deterministic: 

Definition 13. A simulation of the preparation pA is a triple (E, pAE,e) where 
E is a system, pae is a state o/A(8)E, and e is a deterministic effect satisfying 
Eq. |^7[ ). If the state pae is pure, we say that (E, pae, e) is a pure simulation — 
or, more concisely, a purification— of pA. 
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Purifications arise, for example, when we start from a pure product state 
ao 0 770 G PurSt(A 0 E) and evolve it through a reversible transformation U. 
A purihcation gives the agent maximal control over the process of preparation: 
indeed, an agent possessing systems A and E can be sure that no side information 
can hide outside her laboratory. 

Given the importance of purifications, it is important to ask how many of 
them can be found for a given state. Erom a purification there are two trivial 
ways to generate new ones: 

1. by transforming the environment with a reversible transformation Ue such 
that {e\l( = (e|, and 

2. by appending a dummy system D to the environment, prepared in a pure 
deterministic state 5 d such that pae 0 is pure. 

We say that a pure simulation is essentially unique if it is unique up to trivial 
transformations: 


Definition 14. A state pA has an essentially unique purihcation iff for every 
two purifications (E, 4', e) and (E', 4'', e') with E = E' one has 



(28) 


and El 


We -^-[Z) = ■ 


(29) 


for some reversible transformation Ue ■ 


4. The principles 

We are now ready to state our principles for quantum theory. We divide 
them into hve Axioms and one Postulate q The Eve axioms are 

Al Causality. No signal can be sent from the future to the past. 

A2 Purity of Composition. No side information can hide in the composi¬ 
tion of two pure transformations. 

A3 Local Tomography. State tomography can be performed with only local 
measurements. 

A4 Perfect State Discrimination. Every normalized non-internal state 
can be perfectly distinguished from some other state. 


®It turns out that the second condition is automatically satisfied if the theory satisfies the 
Causality axiom—see the next section. 

®We differentiate the names in order to highlight the different roles of these principles in our 
reconstruction. Mathematically, there is no difference between axioms, postulates, background 
assumptions, and requirements in the OPT framework (all of them are “axioms”). The point 
of using different names is just to provide a more intuitive picture. 
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A5 Ideal Compression. Every state can be compressed in an ideal way. 

The five Axioms express generic and rather unsurprising features, which are 
common to classical and quantum theory. We regard the theories satisfying 
these axioms as standard. The Postulate is 

P6 Purification. Every preparation can be simulated via a pure preparation 
in an essentially unique way. 

Purification brings in a radically non-classical feature: the idea that randomness 
can be simulated through the preparation of pure states. We will see that this 
feature singles out quantum theory uniquely among all standard OPTs. 


4 . 1 . Causality 

Causality states that signals cannot be sent from the future to the past. To 
check this condition, it is sufficient to look at a special class of circuits, consisting 
of a single preparation-test, followed by a single observation-test. Precisely, we 
have the following 

Proposition 1. An OPT satisfies Causality if and only if for every system A £ 
Sys, every preparation-test p £ Tests(I —>■ A, X) and every pair of observation- 
tests mo £ Tests(A —>■ I, Yq) and mi £ Tests(A —>■ I, Yi) one has 

p{x\mo) = p{x\mi) Va:£X, 


withp{x\mi) := Ey.ev, 

An even simpler condition for causality is given by 

Proposition 2. A theory satisfies Causality if and only if every system A has 
a unique deterministic effect ca £ DetEff(A). 

In categorical terms, the uniqueness of the deterministic effect can be phrased 
as “terminality of the tensor unit” in the category of deterministic transforma¬ 
tions DetTransf. Categories where the tensor unit is terminal have been intro¬ 
duced by Coecke and Lai (H^ . [5lj| , who named them causal categories. 

Recall that deterministic effects can be used to describe “discarding oper¬ 
ations” , whereby a physical system is eliminated from the description. Now, 
Causality is equivalent to the statement that every physical system can be dis¬ 
carded in a unique way. Thanks to Causality, we can define the marginals of a 
bipartite state in a canonical way 

Definition 15. Let pab be a state of system A® B. The marginal 0 / pab on 
system A is the state pA defined as 
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4.1.1. Causality and No-Signalling 

An important consequence of Causality is the impossibility to signal without 
interaction: in the lack of any interaction between system A an system B, it 
is impossible to influence the probability distribution of a test on system A by 
performing tests on system B. The precise statement is the following 

Proposition 3. For every state pab o-'^d every triple of tests A. € Tests(A — 
A',X), Bo G Tests(B —Bq,Yo) and Bi € Tests(B —Bj,Yi) one has 

p {x\Bo) = p {x\Bi) Vx e X, 

with p{x\Bi) := ® Bi^y^ \p^), i G {0,1}. 


4 . 1 . 2 . Causality and conditional tests 

We introduced Causality as a negative statement: 

C: the choice of tests performed in the future cannot affect the outcome 
probabilities of tests performed in the past. 

The axiom can be reformulated in a positive, and slightly stronger way: 

C': the outcomes of tests performed in the past can affect the choice of tests 
performed in the future. 

Technically, Condition C' establishes the possibility of performing condi¬ 
tional tests, defined as follows: 


Definition 16. Given a test T G Tests(A —>■ B,X) and a collection of tests 
{<Sa; G Tests(B —>■ CjYj,) | x G X}, the conditional test associated to them is 
the collection of transformations 


{5J©r := 


Sf 


a: G X , 2 / 2 , 



Condition C' states that such collection is actually a test, meaning that 

1. the set Z = ^ belongs to Outcomes, and 

2. the collection {<Sa;} 0 T belongs to Tests(A —>• C, Z). 

The relation between C and is the following: 

1. C' implies C, 

2. C implies that the theory can be enlarged to another theory satisfying 
Cb thanks to C, all conditional tests can be included without losing the 
consistency of the probabilistic structure [H. 

Since conditional tests can be included, we will always assume that they are 
included, i. e. we will take the validity of C' as part of the Causality package. 
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4-1.3. Convexity 

The ability to perforin conditional tests brings naturally to convexity of the 
sets of physical transformations. This result can be obtained in two steps: 

1. Under the standing assumptions that the theory is not deterministic and 
that the set Transf(I —^ I) is closed, we obtain that Transf(I —^ I) is the 
whole interval [0,1]. In other words, every number in the interval [0,1] can 

600 be interpreted as the probability of some outcome in some test allowed by 

the theory. 

2. Given two transformations To,71 € Transf(A —>■ B), the convex combina¬ 
tion pT + {1 — p)T' can be generated by 

(a) performing a binary test with the outcomes 0 and 1 generated with 
probabilities po = p and pi = 1 — p 

(b) conditionally on the occurrence of the outcome i, performing a test 
7 ”i containing the transformation % 

(c) coarse-graining over the appropriate outcomes of the conditional test. 

The above observations show that convexity needs not be assumed from the 
start, but can be derived from non-determinism and Causality (in the positive 
formulation C'), under the standard assumption that the set of probabilities 
generated by tests in the theory is closed. 

4 . 1 . 4 . Rescaling 

In addition to convexity, conditional tests guarantee that every state is pro¬ 
portional to a normalized state. Specifically, given a state p of a generic system 
A, one can define the normalized state p := p/(eA|p). An approximate way to 
prepare the state p is to 

1. pick a binary test {po,pi} such that pi = p 

2. perform it N times, generating a string of outcomes {xi,X 2 ,..., xm) 

3. perform a conditional test that discards — 1 systems, keeping only a 
system i such that Xi = 1, if such a system exists, or otherwise keeping 
only the first system 

4. coarse-grain over all outcomes, thus obtaining the deterministic state 

Pat := (1-pAr)p-|-pArpo PAT = (caIpo)^ • 

Clearly, the state pN converges to p when N goes to infinity. Hence, the standard 
assumption that the set of states is closed guarantees that p is a state allowed 
by the theory. 

4 . 2 . Purity of Composition 

Purity of Composition is a very primitive rule about how information prop¬ 
agates in time. Mathematically, the axiom consists of the implication 

A G PurTransf(A —B), S € PurTransf(B —?► C) 

B o Ag PurTransf(A —>■ C), 
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required to be valid for all systems A, B, C G Sys and for all pure transformations 
A and jB. 

Think of a world where this were not the case. In that world, an agent Alice 
could perform a test A. G Tests(A —>■ B ,X) with such degree of control that, 
upon knowing the outcome, she could not possibly know better what happened 
to her system. Immediately after, another agent Bob could perform another 
test B G Tests(B —>■ C, Y) also having maximal knowledge of the system’s 
conditional evolution. Still, some of the resulting transformations ByAx may not 
be pure. This means that ByAx can be simulated by a third party—Charlie— 
by performing one test and joining together the outcomes in a suitable 

subset Sxy C Z 

A 

Z^Sxy 




Bx 


E 


(30) 


Although this scenario is logically conceivable, it rises a puzzling question: 
What is the extra information about? Which physical parameters correspond 
to the outcome zl Surely the information is not about what happened in the 
first step, because Alice already had maximal knowledge about this. Nor it 
is about what happened in the second step, because Bob has maximal infor¬ 
mation about that. The outcome z has to specify a feature of how the two 
time steps interacted together—in a sense, a kind of information that is non¬ 
local in time. Quantum theory is non-local, but not in such an extreme way! 
Indeed, pure transformations in quantum theory are described by completely 
positive maps with a single Kraus operator, i. e. of the form Ax{-) = Ax ■ A\. 
and By{-) = By ■ Bj^, and clearly the composition of two pure transformations is 
still pure: ByAx{-) = {ByAx) ■ {ByAx)^■ Purity of Composition guarantees this 
property at the level of first principles. 


^.3. Local Tomography 

Local Tomography implies that even if a state is entangled, the information 
it contains can be extracted by local measurements. This fact reconciles the 
holism of entanglement and the reductionist idea that the full information about 
a composite system can be obtained by studying its parts [ 2 ^ . 

Mathematically, Local Tomography states that product effects form a sepa¬ 
rating set for the vector space StR(A(g)B). Equivalentlj0, they form a spanning 
set for the dual space StR(A ® B)* = EffR(A ® B). Hence, we must have the 
conditions 

EffR(A 0 B) = EffR(A) 0 EffR(B) and StR(A 0 B) = StR(A) 0 StR(B), 

(31) 

where 0 in the r.h.s. denoted the tensor product of finite dimensional vector 
spaces. Eq. (1^ implies that the dimensions of the vector spaces in question 


^^Recall that we are assuming that the state spaces are finite-dimensional. 
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satisfy the product relation [25 1 


DaisiB = Da Db ■ 


(32) 


Moreover, a generic state p € St(A 0 B) and a generic effect m € EfF(A 0 B) 
can be expanded as 

p = ^ Pij (vi 0 Wj) and ™ = X/ (^* 

i,0 

where [pij] and [rriij] are real matrices, {vi}^i and are bases for the 

vector spaces StR(A) and StR(B), respectively, and and are the 

dual bases, defined by the relations {v*\vk) = Sik and {w*\wi) = 5ji, respectively. 
As a result, the probability of the effect m on the state p can be expressed as 


(m|p) = Tr[mp] , 


(34) 


having committed a little abuse of notation in using the letter m (respectively, 
p) both for the effect (respectively, state) and for the corresponding matrix [niij] 
(respectively, [p^j]). 

Finally, the decomposition in Eq. (1551) implies the following 

Theorem 1. In a theory satisfying Local Tomography, the correspondence be¬ 
tween a transformation £ G Transf(A —>■ B) and the linear map £ : StR(A) —>■ 
StR(B) is invertible. 


In other words. Local Tomography guarantees that physical transformations 
can be characterized in the simplest possible way: by preparing a set of input 
states and performing a set of measurements on the output. 

A remarkable example of a theory that does not satisfy Local Tomography is 
quantum theory on real Hilbert spaces [s^, RQT for short. In this theory, states 
and effects are real symmetric matrices, and transformations are represented by 
completely positive maps mapping symmetric matrices into symmetric matrices. 
The failure of the relation Da®b = Da Db was first noted by Araki 53[ . More 
explicitly, Wootters ^ noted that two different bipartite states can be locally 
indistinguishable, as in the following extreme example: 


P = i| 4 >+)($+| + i|vl/_)(vl/_| p' = ( 35 ) 

with |<i)±) := (|0)|0) ± \l)\l))/y/2 and |4>±) = (|0)|1) ± |l)|0))/-\/2. Here the 
states p and p' have orthogonal support and therefore are perfectly distinguish¬ 
able. However, it is easy to check that one has 


p-p = 


1 /O -1 


2(1 0 


0 -1 

1 0 


and, therefore, Tr[(p — p'){Pa ® ^b)] = 0 for every pair of real symmetric 
matrices Pa and Pb- In other words, p and p' give exactly the same statistics 
for all possible local measurements. 
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RQT has another, closely related quirk: two different transformations of 
system A can act in the same way on all states of A. For example, consider the 
qubit channels C and C, whose action on a generic 2x2 matrix is defined by 

C{M) ■=^M +^YMY and C'{M) := ^ZMZ + ^XMX , 

A, Y, and Z being the Pauli matrices. When acting on symmetric matrices, the 
two channels give exactly the same output: one has C(t) = C'(r) = 1/2 for 
every symmetric matrix r. On the other hand, one has 

(C 01)(|<I>+)($+) = p (C' 01)(I$+)(<!>+) = p ', 

where p and p' are the two perfectly distinguishable states defined in Eq. (l35|) 
above. This means that, in fact, the two transformations C and C are perfectly 
distinguishable with the help of a reference system. For a more extensive dis¬ 
cussion of tomography in RQT we refer the reader to subsection V.A of Ref. [i| 
and to the work of Hardy and Wootters . 

Perfect State Discrimination 

Perfect State Discrimination is an optimistic statement about the possibility 
to encode bits without error. It guarantees that every state that could be part 
of a set of perfectly distinguishable states is indeed perfectly distinguishable 
from some other state. 

By virtue of Perfect State Discrimination, every normalized non-internal 
state po can be perfectly distinguished from some state pi. As a result, the two 
states Po and pi can be used to encode the value of a bit without errors. It is 
easy to see that Quantum theory satisfies the axiom. Indeed, Aadensity matrix 
is internal if and only if it has full rank. Hence, a non-internal density matrix 
Po must have a kernel, so that every state pi with support in the kernel of po 
will be perfectly distinguishable from po. 

4-5. Ideal Compression 

Ideal Compression expresses the idea that information is fungible, i. e. in¬ 
dependent of the physical support in which it is encoded. The axiom implies 
non-trivial statements about the state spaces arising in the theory. For exam¬ 
ple, suppose that the theory contains a system whose space of deterministic 
700 states is a square. Then, the theory should contain also a system whose space 
of deterministic states is a segment—in other words, the theory should con¬ 
tain a classical bit. Indeed, only in this way one could encode a side of the 
square in a lossless and maximally efficient way. More generally. Ideal Com¬ 
pression imposes that the every face of the convex set of deterministic states be 
in one-to-one correspondence with the set of deterministic states of some smaller 
physical system. 

Ideal Compression is clearly satisfied by quantum theory. Indeed, every 
density matrix of rank r can be compressed ideally to a density matrix of an 
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r-dimensional quantum system. For example, the two-qubit density matrix 


P = 


can be compressed ideally to the one-qubit density matrix 


/ Poo, 00 

0 

0 

Poo.ll^ 

0 

0 

0 

0 

0 

0 

0 

0 

Vpii.oo 

0 

0 

Pii,ii) 


( 36 ) 


_ ( Poo ,00 Poo ,11 
Vpil.OO Pll,ll 


(37) 


with encoding and decoding channels given by 


£(■) := Ft (.) y + Tr[(/ -VV^) (•)] |0)(0| F := |0)|0)(0| + |1)|1)(1| 

V(-) :=F(-)Ft. 


Note that Ideal Compression refers to a single-shot, zero error scenario, i. e. a 
scenario where the source is used only once and no decoding errors are allowed. 
Such a scenario is different from the asymptotic scenario considered in Shan¬ 
non’s (s^ and Schumacher’s (s^ compression, wherein small decoding errors 
are allowed, under the condition that they vanish in the asymptotic limit of 
infinitely many uses of the same source. 


4-6. Purifieation 

While our first five axioms expressed standard requirements for information¬ 
processing, Purification brings in a radically new idea: at least in principle, every 
state can be prepared by an agent who has maximal control over all the systems 
involved in the preparation process. In short, Purification allows us to harness 
randomness by controlling the environment. The idea does not apply only to 
preparations, but also to arbitrary deterministic transformations: combining 
Purification with Causality and Local Tomography, we can prove the following 

Theorem 2. For every deterministic transformation T € DetTransf(A —>■ 
A'), there exist two systems E and E', a pure state rj G PurSt(E), and a reversible 
transformation U € RevTransf(A ® E —>■ A' ® E') such that 



where e is the unique deterministic effect of system E'. 

In other words, Purification implies that every irreversible process can be 
simulated through reversible interactions between the system and its environ¬ 
ment, with the environment initialized in a pure state. This result is a necessary 
condition for the formulation of physical theories in which elementary processes 
are reversible at the fundamental level. 


27 












Purification is known to be satisfied by quantum mechanics. For example, 
consider a single-qubit mixed state, diagonalized as 


P = Po|0)(0| -hpi|l)(l|, 


(39) 


for some suitable orthonormal basis {|0), |1)}. A purification of the state p can 
be obtained by adding a second qubit and by preparing the two qubits in the 
pure state 


1^) := v^|0)|0)-Kv^|l)|l). (40) 


Indeed, it is immediate to see that p is the marginal of the density matrix |'I')('I'| 
on the hrst qubit. In addition, any other purification j^*')—using a single qubit 
as the purifying system—must be of the form |'I')' = (J® [/) for some unitary 
matrix U. 

In the quantum information community, taking purifications is a standard 
approach to quantum communication, cryptography, and quantum error cor¬ 
rection. The approach is familiarly known with the nickname of “going to the 
Church of the larger Hilbert space” [^i]- Purification is known among mathe¬ 
maticians as the Gelfand-Naimark-Segal construction 

Two important remarks are in order: 


1. Purification, entanglement, and quantum information. Purification is in¬ 
timately linked with the phenomenon of entanglement [^, namely the 
existence of pure bipartite states 4 'ab that are not of the product form 
In the OPT framework, the link is made precise by the following 


Proposition 4. Let Q be a theory satisfying Causality, Local Tomogra¬ 
phy, and Purification. Then, there are only two alternatives: either 0 is 
deterministic, or 0 exhibits entanglement. 


Under our standing assumption that the theory is non-deterministic, en¬ 
tanglement follows from Purification as a necessary consequence. 
Entanglement is a very peculiar feature—far from what we experience 
in our everyday life. How can we claim that we know A and B if we 
do not know A alone? This puzzling feature had been noted already 
in the early days of quantum theory, when Schrddinger famously wrote: 
“Another way of expressing the peculiar situation is: the best possible 
knowledge of a whole does not necessarily include the best possible knowl¬ 
edge of all its parts" Q. And, in the same paper: “I would not call 
that one but rather the characteristic trait of quantum mechanics, the 
one that enforces its entire departure from classical lines of thought”. In 
a sense, our reconstruction can be considered as a mathematical proof 
of Schrodinger’s intuition B on the background of five standard axioms 


^^The expression is due to John Smolin, see e.g. the lecture notes [s^l . 

^^It is worth stressing that Schrodinger’s paper was not just about the existence of entangled 
states, but also about how entanglement interacted with the reversible dynamics and with the 
process of measurement (cf. the notion of steering, which made its first appearance in the 
very same paper). 
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satisfied by both classical theory and quantum theory, Purification is the 
ingredient that allows to reconstruct the Hilbert space framework and the 
distinctive information-theoretic features of quantum theory. Combined 
with Causality and Local Tomography, Purification already reproduces an 
impressive list of quantum-like features, like no-cloning, no-programming, 
information-disturbance tradeoff, no bit commitment, conclusive telepor¬ 
tation and entanglement swapping, the reversible dilation of channels, the 
state-transformation isomorphism, the structure of error correction, and 
the structure of no-signalling channels [ij. 

2. Purification and the Many Wolrd Interpretation. Pondering about the 
meaning of Purification, one may tempted to conclude that it favours 
the Many Worlds Interpretation (MWI) of quantum mechanics 6l|. In 
fact, Purihcation is feature of quantum theory, and, as such, it does not 
favour the MWI more than quantum theory itself does. Whether or not 
quantum theory provides any evidence for many worlds is a debatable 
point, but the validity of Purification is independent such interpretative 
issue. Furthermore, we stress that we did not phrase Purification as an 
ontological statement about “how processes occur in nature”, but rather 
an operational statement about the agent’s ability to simulate physical 
processes with maximal control. Purification is compatible with the idea 
that processes are reversible at the fundamental level, and its validity is 
a necessary condition for building up a physical description of nature in 
terms of pure states and reversible processes. Still, here we do not make 
any commitment as to how processes are realized in nature, because this 
would unnecessarily limit the range of application of our results. 


5. The reconstruction of Quantum Theory 

Here we provide a summary of the reconstruction of Refs. [l|,[3) highlighting 
the key theorems and providing a guide to the original papers. The scope of 
the reconstruction is not just to derive the Hilbert space framework, but also to 
rebuild the key quantum features directly from first principles. Accordingly, we 
try to derive as much as possible of quantum theory directly from the axioms, 
leaving Hilbert spaces to the very end. We organize our results in six subsections: 
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1. Elementary facts. 

2. Correlation structures. 

3. Distinguishability structures. 

4. Interaction between correlation and distinguishability structures. 

5. Qubit features. 

6. The density matrix. 


5.1. Elementary facts 

5.1.1. From Local Tomography 

Local Tomography implies a few useful facts: 
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1. If a € St(A) and /? G St(B) are pure, then also a 0 /3 is pure. 

2. Let pab be a state of the composite system A0B and, assuming Causality, 
let PA be its marginal on system A. If pA is pure, then pab is a product 
state. 

3. If PA G St(A) and pe G St(B) are internal states, then also pA 0 Pb is an 
internal state. 

4. Suppose that every system A has a unique invariant state XA, i- e. a unique 
state satisfying the condition UxA = XA for every reversible transforma¬ 
tion U. Then, xa®b = Xa 0 Xb- 


5.1.2. From Purification 

Purification has a few immediate consequences. First, all pure states of a 
given system are connected to one another through reversible transformations: 

Proposition 5. For every system A G Sys and every pair of pure states a, a' G 
PurSt(A) there exists a reversible transformation lA such that a' = U a. 


To prove this fact, it is enough to pick a system B and pure state (3 G 
PurSt(B), consider the states 4' = a 0 /3 and 4'' = o' 0 /3 as purifications of 
/3, and invoke the essential uniqueness of purification [Eq. (l28l) ]. Mathemati¬ 
cally, the above proposition expresses the fact that the action of the reversible 
transformations is transitive on the manifold of pure states—a requirement that 
played an important role in many recent reconstructions, see e. g. 25, 3^, 2^. 
A byproduct of transitivity is 

Proposition 6. Every system A G Sys has a unique invariant state xa- 


Finally, combining Ideal Compression and Purification it is easy to see that 
every state has a minimal purification, in the following sense 

Definition 17. Let 4^ G PurSt(A0 B) be a pure state with marginals pA and 
Pb on systems A and B, respectively. We say that ^ is a minimal purification 
of Pa iff Pb is internal. 


To construct a minimal purification, it is enough to take an arbitrary purifi¬ 
cation and to compress the state of the purifying system. 


5.2. Correlation structures 
5.2.1. Pure Steering 

One of the most important consequences of our axioms is that pure bipartite 
states enable steering, namely the ability to remotely generate every desired 
ensemble decomposition of a marginal state SHI: 


Proposition 7 (Pure Steering). Let 41 6e a pure state of the composite system 
A 0 B, let p he the marginal of 4'ab on system A, and let p = {pxjxgx be an 
ensemble decomposition of p. Then there exists a measurement b = {bx}xex 
such that 





Vx G X. 


( 41 ) 


30 











Pure steering is the essential ingredient for a number of major results. The 
first result is the existence of pure, tomographically faithful states. A state 
p £ St(A 0 B) is called tomographically faithful for system A iff the implication 



T = T', 


(42) 


holds for every system C and every pair of transformations T and T' of type A —^ 
C. Thanks to Pure Steering and Local Tomography, we are able to construct 
tomographically faithful states: 


Proposition 8. Let px be an internal state of system A and let £ PurSt(A 0 
B) be a purification of pA- Then, 4' is tomographically faithful for system A. 


The result can be improved by choosing a minimal purification: in this way, 
the pure state 4* is faithful on both systems A and B. We call such a state 
doubly faithful. 


5.2.2. Conjugate systems 

A canonical choice of doubly faithful state is obtained by picking a min¬ 
imal purification of the invariant state xa- We denote such purification by 
$ £ PurSt(A 0 A) and call system A the conjugate of system A. The name is 
motivated by the following facts: 

1. system A is uniquely defined, up to operational equivalence 

2. the marginal of $ on system A is the invariant state Xa Corollary 46 
of 0), meaning that we have A = A, up to operational equivalence. 

Summarizing, the state <i> satisfies the relations 





and 



By analogy with quantum theory, we call <i> a Bell state. 



(43) 


5.2.3. The state-transformation isomorphism 

For a given transformation T, we define the (generally unnormalized) state 



(44) 


and call the correspondence T i—^ $ 7 - the state-transformation isomorphism. 
Since the Bell state $ is doubly faithful, the correspondence is one-to-one. In 
quantum theory, the state-transformation isomorphism concides with the Choi 
isomorphism (63j. By analogy, we call the state <i> 7 - the Choi state. 

A powerful byproduct of the state-transformation isomorphism is that the 
normalized states completely identify the theory: 
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Theorem 3. Let 0 and Q' be two theories with the same set of systems. If 
the sets of normalized states of 0 and 0 ' coincide for all systems, then the two 
theories coincide. 

Thanks to this result, deriving the density matrix representation of normal¬ 
ized states is sufficient to derive the whole of quantum theory. 

5 . 2 . 4 . Conclusive entanglement swapping 

An important consequence of Pure Steering is the possibility of entangle¬ 
ment swapping, namely the possibility to generate entanglement remotely by 
performing a joint measurement. Consider, as a prototype of entangled state, 
the Bell state $. Then, it is possible to show that there exists a pure effect 
E € PurEfF(A ® A) and a non-zero probability pA > 0 such that 



This diagram represents an instance of conelusive entanglement swapping: con¬ 
ditionally on the occurrence of the effect E, the two systems A and C are 
prepared in the Bell state, consuming the initial entanglement present in the 
composite systems A ® Bi and B 2 (8 C. 

The possibility of entanglement swapping follows easily from Pure Steering: 
Since the states xa and Xa internal, Local Tomography implies that their 
product Xa C* Xa internal. Hence, there must exist a non-zero probability 
PA > 0 such that 


XA O Xa = PA $ + (1 - pa) T, (46) 

for some state r. Applying Pure Steering (proposition [7]) to the pure state 
4>(8)$ and to the ensemble {pa 4*, (1~Pa) one can find a binary measurement 
{E, cbi 0 682 — E} such that the entanglement swapping condition (H51) holds. 
Using the fact that the state 4) 0 4> is doubly faithful, it is easy to see that the 
effect E must be pure. 

5.2.5. Conclusive teleportation 

By the state-transformation isomorphism, conclusive entanglement swapping 
is equivalent to conclusive teleportation 0 , expressed by the diagram 


( 47 ) 
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Indeed, the entanglement swapping diagram (1451) is equivalent to the condition 
$ 7 - = $ 7 -/, with 



(48) 


By the state-transformation isomorphism, $7- = $7-/ implies T = T', which is 
nothing but the teleportation diagram. 

5.2.6. The teleportation upper hound 

Combined with Local Tomography, the teleportation diagram allows us to 
upper bound the dimension of the state space. The idea is to write the telepor¬ 
tation diagram in matrix elements, by expanding $ and E as 

$ = ^ d’ife {vi ® Wk) and E = '^Eji [w* 0 t";*) , (49) 

ik jl 

with suitable bases and In this representation, Eq. (ITTl) be¬ 

comes 

[<i>EU=PASa. (50) 

and, taking the trace, 

TT[<i>E]=pADA. (51) 

On the other hand, we have 

TT[<i>E] = {E\S^^m<l, (52) 

which combined with Eq. m leads to bound 

Da<—- (53) 

PA 

Clearly, in order to have the best bound we need to find the maximum proba¬ 
bility of teleportation. To discover what the maximum is, we need to move our 
attention to the distinguishability structures implied by our axioms. 

5.3. Distinguishability structures 
5.3.1. No disturbance without information 

Our first move is to derive a simple result about the structure of measure¬ 
ments: a measurement that extracts no information from a face of the state 
space can be implemented without disturbing that face. By face of the state 
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space we mean a face of the convex set of deterministic stateJ^. We say that 
the measurement m G Tests(A —>■ I,X) does not extract information from the 
face F iff there exists a set of probabilities {px]xeX such that 


(ma;|r)=p^ Va; G X, 'ir G F . 


Also, we say that a test T G Tests(A —>■ A, X) does not disturb the face F iff 
= 1"^) fo'' ®very state t G F. 

With this terminology, our result is the following: 

900 Proposition 9. If a measurement m does not extract information from the face 
F, then there exists a test 7” that realizes the measurement—namely (caITx = 
mx, Vx G X— and does not disturb F. 

This result has two important consequences. First, it allows us to establish 
whether or not a set of perfectly distinguishable set can be extended: 

Proposition 10. Let S = {px \ x G X} be a set of perfectly distinguishable 
states and let ws be its barycenter, defined as 



Then, the following are equivalent: 

1. the set S is maximal, i. e. no other set S' D S can consist of perfectly 
distinguishable states 

2. the barycenter of S is internal. 

Another important consequence is that only the pure maximal sets can have 
maximum cardinality: 

Proposition 11. Let S be a maximal set of perfectly distinguishable states of 
system A. If one of the states in S is not pure, then there exists another max¬ 
imal set S' C St (A), consisting only of pure states and having strictly larger 
cardinality |S'| > |S|. 

Combining the above points we have that every pure state belongs to some 
maximal set of perfectly distinguishable pure states. For short, we call such 
such sets pure maximal sets. 

5.3.2. Duality between pure states and pure effects 

For a pure maximal set S, we observe that the measurement that distin¬ 
guishes the states in S must consist of pure effects. Hence, for every pure state 
a G PurSt(A) there exists an pure effect a such that (a|a) = 1. Expanding 


recall that a face of a convex set C is a convex subset F G C satisfying the condition 
that, for every x G F, if x is a non-trivial convex combination of xi and X2 with xi,X2 G C, 
then xi and X2 belong to F. 
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on this observation, we establish a one-to-one correspondence between pure 
normalized states and pure normalized effects 0 denoted by PurSti(A) and 
PurEffi(A), respectively. 

Theorem 4. For every system A G Sys, there exists a one-to-one map f : 
PurSti(A) —>■ PurEfFi(A), sending pure normalized states to pure normalized 
effects and satisfying the condition 

(a^|a) = 1 Vo G PurSti(A). 

The proof is rather elaborate. The two main steps are 

1 . proving that every pure normalized effect a identifies a pure state a, mean¬ 
ing that (a|p) = 1 if and only li p = a. 

2 . proving that, if two pure effects identify the same state, then they must 
coincide. 

The second step uses Pure Steering in an essential way, suggesting that the 
distinguishability features of quantum theory are deeply connected with its cor¬ 
relation features. 

5.3.3. The informational dimension 

An easy consequence of the state-effect duality is that every two pure nor¬ 
malized effects are connected by a reversible transformation, just like the pure 
states. In turn, this leads to a useful result 

Proposition 12. For a given system A G Sys, all pure maximal sets have the 
same cardinality. 

The proof idea is simple: let a = be the measurement that dis¬ 

tinguishes among the states in a pure maximal set S = {ox | a: G X}. As 
we already observed, all the effects in a must be pure. Since every two pure 
normalized effects are connected by a reversible transformation, we must have 
Ox = a o Ux'^x G X, where a is fixed (but otherwise arbitrary) effect in 
PurEffi(A) and Ux is a reversible transformation. Applying the effects to the 
invariant state x then obtain 

(ax|x) = (a|x) VxGX, 

and summing over x we get the equality 1 = |X| (a|y). Hence, the cardinality 
of the maximal set S is |S| = |X| = l/(a|x). Since S is a generic pure maximal 
set, we proved the desired result. 

In the following, the cardinality of the pure maximal sets in A be denoted by 
cZa- We call it the informational dimension^ because it is the number of distinct 
classical messages that can be encoded in system A and decoded without error. 


We call an effect of system A normalized Iff there exists an effect a state p such that 

(a|p) = 1. 


35 



In Quantum Theory, cIa is the dimension of the Hilbert space associated to 
system A. 

For composite systems, the informational dimension has the product form: 

Proposition 13. For every pair of systems A and B one has dA®B = d^dB- 

The reason is simply that the product of two pure maximal sets for systems 
A and B is a pure maximal set for A 0 B: it is pure, because the product of 
two pure states is pure (by Local Tomography) and it is maximal because the 
product of two internal states is internal (again, by Local Tomography)—hence, 
maximality follows by proposition [TU] 

5.3.4- The spectral theorem 

An important consequence of the state-effect duality is the ability to de¬ 
compose every state as a mixture of perfectly distinguishable pure states. The 
crucial step is to prove such a decomposition for the invariant state: 

Lemma 1. For every pure maximal set C PurSt(A) one has x = 

a^. 

djx ^x—\ ^ 

This result is extremely important, because it helps us to cope with the 
existence of different maximal sets of pure states. To begin with, it allows us to 
prove the analogue of the spectral theorem: 

Theorem 5 (Spectral Decomposition). For every vector v £ StR(A) there exists 
a pure maximal set {Q:x}t=i C PurSt(A) and a set of real coefficients 
such that 

dA 

V = Cx ax . ( 54 ) 

a :—1 

Similarly, for every vector w £ EfFR(A) there exists a pure discriminating mea¬ 
surement {ax'\'^^i and a set of real coefficients such that 

dA 

w = '^dxax . ( 55 ) 

X — 1 


5.3.5. Orthogonal faces 

Thanks to spectrality, it is easy to retrieve the basic structures of quantum 
logic. In general, the faces of a convex set C form a bounded lattice, with 
partial order < corresponding to set-theoretic inclusion and with meet and join 
operations defined as F A G := F fl G and F V G := f]{H \ F C FI, G C FI}, 
respectively. The lattice is bounded, with the convex set G being the top element 
and the empty set 0 being the bottom element. Hence, the set of deterministic 
states Ga := DetSt(A) in a convex theory can be seen as a lattice in the above 
way. However, our axioms imply much more: according to them, the faces of 
the state space form an orthomodular lattice, i. e. a lattice with an operation of 
orthogonal complement T satisfying the orthomodularity condition F < G => 
G = FV(GAF-^). 
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Let us see why this is the case. For a given face F C Ca we can pick a set of 
perfectly distinguishable pure states Sp = {cy.x}‘i’Li C F that is maximal in F, 
meaning that no other state in F can be distinguished perfectly from the states 
in Sf- Then, we can define the barycenter of F as 


ujp := 


1 

dp 


dp 

£ 

x—l 


(56) 


Since the face F can be compressed into the state space of a smaller system, 
lemma [T] guarantees that the definition of the state tap depends only on F, and 
not on the maximal set Sp. In other words, Eq. (1551) sets up a one-to-one 
correspondence between faces and their barycenters. 

Now, we can extend the set Si? to a pure maximal set for system A, say 
{q^x}x= 1 ' Let us define the set S/t-^ := and denote by F^ the 

smallest face containing Sp±. By construction, it is easy to verify that the set 
Sp± is maximal in F^ and therefore 


Ldp'± — 


1 

dA — dp 


dA 

X—dp-\-l 


F-^ can be equivalently characterized as the face containing all the states 
that are perfectly distinguishable from F. Moreover, it is not hard to show that 

1. FVF-L = Ca 

2. F A F-L = 0 

3. {F^)^ = F 

4. F F G ^ G-L F F-L 

5 . F^G ^ G = FV (GAF-J-), 

where the last two properties are proven by picking a pure maximal set for F, 
extending it to a pure maximal set for G, and extending the latter to a pure 
maximal set for the whole convex set Ga- Properties 1-4 show that the oper¬ 
ation _L is an orthogonal complement, while property 5 is the orthomodularity 
condition. Hence, we obtained that the set of faces must be an orthomodular 
lattice. 


5.3.6. Orthogonal effects 

By the state-effect duality, we can associate every face F with an effect ap, 
defined as 


d p 

ap:='^al, (57) 

X—l 

where Sp = {otxf'lFi is a pure maximal set in F. Again, it is easy to see that 
the definition of of is independent of the choice of maximal set Si?. Indeed, by 
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definition one has ap + ap^ = for every pure maximal set Sp±-. Varying Sf 
without varying Sp± shows that the definition of of depends only on F. 

Thanks to the spectral theorem, ap can be operationally characterized the 
only effect that happens with unit probability on F and with zero probability 
on F-*-: 

Proposition 14. ap is the unique effect a € Eff(A) satisfying the conditions 

{a\p) = 1 Vp € F 
(a|cr) =0 Vcr G F"*" . 

For this reason, we call ap the identifying effect of the face F. The set of 
identifying effects inherits the structure of orthomodular lattice from the set of 
faces, via the following definitions 

1. ap ^ ac iS F ^ G, 

2. ap A aa '■= ap^^G, 

3. ap V ac ■= o,p\/G, and 

4. Op := ap^. 

In quantum theory, the lattice of identifying effects is the lattice of projectors on 
subspaces of the Hilbert space. It is easy to see that the partial order ^ coincides 
with the partial order < induced by the probabilities, namely of ^ og if and 
only if {ap\p) < (aclp) for every state p. 

5.3. 7. Orthogonal projections 

Faces of the state space can also be associated with physical transformations, 
in the following way: 

Definition 18. A transformation Hf G Transf(A —?► A) is an orthogonal pro¬ 
jection on the face F C Ca iff the following conditions are satisfieqj^ 



Up 

A 

= 

ypeF 

(59) 


Up 

A 

= 0 

VCT G F-^ . 

(60) 


the original work Q, we also required that projections be pure. However, in the context 
of our axioms, purity is implied by the two conditions in the present definition. A sketch of 
proof is the following: First, one can prove that for every pure state a a F one must have 
(at I Hf = (ot| (this follows from the definition and from proposition [TJll . As a consequence, 
one also has (qfI Hf = (“fI- This implies that, for every state p S St(A), the unnormalized 
state Hf |p) is proportional to a state in F. Now, for two projections Hf and Hj^ one must 
have 

(at|nF|p) = (at|p) = (ot|nF|p), (58) 

for every pure state a £ F. Since the states 11^ \p) and 11^ |p) are proportional to states in F 
and a £ F is a generic pure state, Ideal Compression implies 11^7 |p) = |p), or equivalently, 

Tip = n^, because the state p is generic. 
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The definition is non-empty: thanks to Purification and Purity of Composi¬ 
tion, we are able to construct a pure projection 11/7’ for every face F. Moreover, 
it follows from the definition that the projection Il/r' is unique. 

In addition to purity, projections have a number of properties, including 

1. (a^l IIf = 0 

2. {ao \ 11^ = (ogI whenever G < F 

3. for every input state p, the normalized output state r := II/t’ |p)/(eA| H/t’ \p) 
belongs to F 

4. IIg 11/7’ = n/T’IlG = He whenever G < F. 

5.4- Interaction between correlation and distinguishability structures 

We have seen that our axioms imply peculiar features, both in the way 
systems correlate and in the way states can be distinguished. It is time to 
combine these two types of features and to explore the consequences. 

5.4- 1. The Schmidt bases 

Combining Pure Steering and Spectral Decomposition, we are now in posi¬ 
tion to give the operational version of the Schmidt bases in quantum theory. 
The result can be summarized as follows: 

Proposition 15. Let ^ be a pure state o/ A 0 B and let pA and p^ be its 
marginals on systems A and B, respectively. Then, for every spectral decompo¬ 
sition 


r 

Pa — ^ ^ Px j 

x—1 

there exists a set of perfectly distinguishable pure states {PxYx^i ^ PurSt(B) 
such that 


PB = '^Pxl3x- (61) 

x—1 


Moreover, one has 

f PxSxy x,y € {l,...,r} 

= { (62) 
•^~r^ i 0 x,y^ {I,...,r} 

for every two measurements a = {ax}x=i and b = {by}y^i satisfying Ox = a\, 
and by = for every x < r and for every y < r. 

In particular, applying the result to the Bell state $, we obtain that the 
invariant state Xa decomposed as Xa ~ ^ for a suitable set 

of perfectly distinguishable pure states In particular, this implies that 

conjugate systems have the same informational dimension: 
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Corollary 1. For every system A, one has d-^ = d^- 

Combined with the fact that the informational dimension is multiplicative 
fproposition [T3l) . the above result implies that the composite system A0 A has 
informational dimension 

'^A®A “ ■ 

5.4-2. The maximum probability of eonclusive teleportation 

In our construction of conclusive teleportation, the teleportation probability 
was equal to the probability of the state $ in an ensemble decomposition of the 
invariant state xa ® Xai (011) • Now, since XA ® Xa invariant state, 

it can be decomposed as 


1 . 

XA ® Xa = ^ Z! ’ 

X—1 

for every pure maximal set The maximum probability of the Bell 

state in a convex decomposition of xa Xa then given by 

pT^ = i • (63) 

a A 

Inserting the above equality into the teleportation upper bound (l53l) we obtain 
the relation 


DA<d\. (64) 

In the next paragraph we will see how to obtain the converse inequality. 
5 . 4 . 3 . The teleportation lower bound 

Thanks to the state-effect duality, it is possible to establish a lower bound 
on the state space dimension. The proof is a little bit laborious and consists of 
two steps: 

1. show that the effect ‘I’f that identifies the Bell state is of the form 



where E is the effect achieving maximum teleportation probability, a 
is the swap, and U is some reversible transformation. 

2 . show that, with a suitable choice of basis for the vector space StR(A), 
every reversible transformation U is represented by an orthogonal matrix 
Mu- 
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Once these two results are established, we can expand the Bell state and the 
teleportation effect E as in Eq. (|4^ . thus obtaining 

1 = ($t 1 $) = Tr[$ E Mu] = pr" Tr[Mi,] < Da , (65) 

having used the teleportation equality ^E = p™“/_Da and the fact that the 
trace of an orthogonal matrix cannot be larger than the trace of the identity. 
Hence, we obtained the teleportation lower bound 

Da>^. ( 66 ) 

Pa 

Combining the teleportation lower bound with Eqs. (ESI) and dMl), we obtain 
the equality 


Da — d\. 


(67) 


5.5. Qubit structures 

So far, we avoided giving a concrete representation of our state spaces: all 
the quantum features that we have shown followed directly from the principles. 
We now proceed to analyze some features that are more closely related to the 
concrete geometrical shape of the quantum state spaces. We will first see that 
all two-dimensional systems in our theory have qubit state spaces. Leveraging 
on this fact, we will then derive two features of higher-dimensional systems: i) 
an operational version of the superposition principle, and ii) the fact that all 
systems of the same dimension are operationally equivalent. 

5.5.1. Derivation of the qubit 

Showing that the states of a two-dimensional system can be described by den¬ 
sity matrices is quite easy. This can be done geometrically, by showing that the 
deterministic states form a 3-dimensional Euclidean ball. The 3-dimensionality 
is obvious from the equality Da = d\, which for c^a = 2 implies that the convex 
set Ca = DetSt(A) is a three-dimensional manifold 0. Then, we can make a 
simple geometrical reasoning: 

1. all the pure states are generated from a fixed pure state by application of 
reversible transformations, and, by choosing a suitable basis for the state 
space, such transformations act in the 3-dimensional space as orthogonal 
matrices. 

2. all states on the border of Ca are pure—otherwise. Perfect State Discrim¬ 
ination and proposition 1111 would imply c?a > 2. This means that, if we 
move away from the invariant state XA in an arbitrary direction, at some 
point we will hit a pure state. 


^®In general, the dimension of the convex set Ca is given by Da — 1- 
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In the ordinary 3-dimensional space, the sphere is the only (closed) 3-dimensional 
1100 convex set generated by orthogonal matrices and with only pure states on the 
border. 

Once we established that the convex set Ca is a sphere, we can represent 
every normalized state p G Ca with a density matrix Sp. In particular, the pure 
states will be of the form 


( P \/p(l -p)e 

V\/p(I -p) e*® 1-p ) 


|a)(a| 


( 68 ) 


|a) := ^ |0) -h e*® V^I -p|l), 

for some probability p € [0,1] and some phase B G [0, 2tt). Once we have chosen 
this representation, it is obvious that every effect a G Eff(A) must be described 
by a positive semidefinite matrix Ea upper bounded by the identity and that 
probabilities are given by the Born rule 

(a|p) = Tr[Ea Sp]. (69) 

Moreover, the state-effect duality imposes that all such matrices represent valid 
effects. 

5.5.2. The superposition principle 

Pure states in quantum theory satisfy the so-called “superposition principle”, 
which just means that they are in one-to-one correspondence with the rays of 
the underlying Hilbert space. Per se, this statement has hardly any operational 
meaning. However, one can formulate an operational version in general OPTs: 

Definition 19 (Superposition Principle). We say that system A satisfies the 
superposition principle iff for every pure maximal set S = {oa, | x G X} C 
PurSt(A) and for every probability distribution {pxjxex there exists one pure 
state Ip such that 

Px VxGX, (70) 

for every measurement a = {ax}xex that perfectly distinguishes among the 
states in the maximal set S. 

Now, in a theory satisfying our principles we know that the two-dimensional 
systems are quantum—and therefore satisfy the superposition principle. Thanks 
to Ideal Compression, it is then easy to generalize the result to systems of arbi¬ 
trary dimension; given two perfectly distinguishable pure states, one can encode 
them into a two-dimensional system, use the Bloch sphere representation to find 
the superposition state, and come back with the decoding operation. Iterating 
this procedure, we can superpose any number of perfectly distinguishable pure 
states. 

As a simple application of the superposition principle, we obtain the follow¬ 
ing 
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Proposition 16. A state p\ with spectral decomposition pA = Y^x=i 
a purification with purifying system B if and only if dB > r. 

The “only if’ part was already clear from the Schmidt decomposition. For 
the “if’ part, it is enough to pick r perfectly distinguishable pure states of B, say 
{PxYx=n and to superpose the product states {ax ® PxYx=i with probabilities 
{PxYx=i- The resulting pure state G PurSt(A(g) B) is the desired purification. 

5.5.3. The superposition principle for transformations 

The superposition principle allows us to glue distinguishable states in any 
way we like. Thanks to the state-transformation isomorphism, we can extend 
this idea to transformations. For example, consider a set of pure transformations 
{A I X G X} C PurTransf(A — >■ B) and suppose that they have orthogonal 
support, that is, that there exists a set of orthogonal faces {Fx | cc G X} such 
that 


Ax = Ax Bf, Vx G X. (71) 

Then, it is possible to find a pure transformation A G PurTransf(A —^ B) such 
that 


AHf, VxGX. (72) 

The result follows by noticing that the Choi states {$.4,, | x G X} are propor¬ 
tional to pure and perfectly distinguishable states and by applying the super¬ 
position principle to corresponding the normalized states. 


5 . 5 . 4 . Equivalence of pure maximal sets up to reversible transformations 

Using the superposition principle for transformations we can prove that all 
pure maximal sets of the same cardinality are equivalent: 

Proposition 17. Let and {/3y}y^ibe pure maximal sets for systems 

A and B, respectively. If d a = dB, then there exists a reversible transformation 
U G Transf(A —>■ B) such that 




Vx G X. 


The result follows immediately from the application of the superposition 
principle to the pure transformations Ax = |/3x)(q:J,|. As a corollary, we have 
that all systems of the same dimension are operationally equivalent. 


5.6. The density matrix 

We finally reached to the end of the reconstruction. It is now time to enter 
into the specific details of the Hilbert space formalism of quantum theory. Our 
strategy to reconstruct the Hilbert space formalism is to show that, for every 
system A, there exists a one-to-one linear map from the vector space StR(A) to 
the space of cZa x cZa Hermitian matrices, with the property that the convex set 
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of deterministic states is mapped to the convex set of density matrices (non¬ 
negative matrices with unit trace). 

Let us see how this can be proven. Since the dimension of the state space 
satisfies the relation Da = d\, every vector v G StR(A) can be represented as 
square (Ia x cJa real matrix My. In turn, the matrix My can be turned into a 
complex Hermitian matrix Sy, applying the linear transformation 

Sy := {My + Mj) +i{My- MJ) , (73) 

where M'^ denotes the transpose of M. The problem is now to find a suit¬ 
able representation in which normalized states p G Ca correspond to density 
matrices, that is S'p > 0 and Tr[p] = 1. To find such a representation, we fol¬ 
low Hardy’s method [1^: we pick a pure maximal set and define the 

diagonal elements of the matrix Sp as 

[5'p]mm := {0!\n\p) i 


In this way, we guarantee the unit-trace condition Tr[S'p] = I. To define the off- 
diagonal elements, we consider the two-dimensional faces Fmn ■= {ctm} V {an}, 
n> m. Projecting the state inside these faces, we obtain the states 


^Fm„\p) 

(eA|nF„„ Ip) 


n > m. 


Since every state p"*” is belongs to a two-dimensional face, it can be encoded 
into a qubit system and can be associated with a density matrix r™". The off- 
diagonal elements [Sp]mn and [Sp]nm are defined in term of the qubit density 
matrix as 


[•S'pimn := [t'"”]oi and [5'p]nm := [t'""]io ■ 

The matrix Sp defined in this way is clearly Hermitian and, with a little bit of 
work, one can see that the linear map p i—>■ S'p is one-to-one. 

At this point the problem is to guarantee that the matrix Sp is positive. We 
consider first the case of pure states a G PurSt(A), for which one has 

[Salrjin — \/Pm Pn ^ 

where {pm}'^=i is a suitable probability distribution and {0mn} are phases 
satisfying the conditions 0rnm = 0 for every m and 6 nm = —0mn for ev¬ 
ery n > m. This expression follows from the fact that each state |a™") = 
n_Fm„ W)/{'^A\^Fmn pucc and, once encoded into a qubit, it has a density 
matrix of the form (1681) . In order to prove positivity, we need to show that the 
phases are of the form Omn = 7m — In, for some phases {7m}- The strategy 
is to prove the result first in dimension cIa = 3 and then to extend it to arbitrary 
dimensions. 

Once we have proven that pure states correspond to rank-one projectors, it 
remains to show that all such projectors correspond to pure states. This can 
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be done by using the superposition principle (both for states and for reversible 
transformations). Having proven that the set of pure states is in one-to-one 
correspondence with the set of rank-one projectors, it follows by convexity that 
the set of states is in one-to-one correspondence with the set of density matrices. 
In short, all state spaces are quantum. 

To complete our reconstruction, we invoke theorem[31 which guarantees that 
the tests in our theory are in one-to-one correspondence with the test allows by 
quantum theory. 

6. Conclusions 

Quantum theory can be rebuilt from bottom to top starting from six ba¬ 
sic principles. The principles do not refer to specific physical systems such as 
particles or waves: instead, they are the rules that dictate how information 
can be processed. The first five principles—Causality, Purity of Composition, 
Local Tomography, Perfect State Discrimination, and Ideal Compression—can 
be thought of as requirements for a standard theory of information. On the 
background of these five principles, the sixth—Purification—stands out as the 
quantum principle, which brings in counterintuitive features like entanglement, 
no cloning, and teleportation. Purification gives the agent the power to harness 
randomness, by simulating the preparation of every state through the prepa¬ 
ration of a pure bipartite state. When this is done, the agent has an intrinsic 
guarantee that no side information can hide outside her control. The moral of 
our reconstruction is quantum theory is the standard theory of information that 
allows for maximal control of randomness. 
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